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ABSTRACT 

We report results from the Supernova Photometric Classification Challenge (SNPhotCC), a publicly 
released mix of simulated supernovae (SNe), with types (la, Ibc, and II) selected in proportion to 
their expected rate. The simulation was realized in the griz filters of the Dark Energy Survey (DES) 
with realistic observing conditions (sky noise, point-spread function and atmospheric transparency) 
based on years of recorded conditions at the DES site. Simulations of non-la type SNe are based on 
spectroscopically confirmed light curves that include unpublished non-la samples donated from the 
Carnegie Supernova Project (CSP), the Supernova Legacy Survey (SNLS), and the Sloan Digital Sky 
SurvcyTI (SDSS-II). A spectroscopically confirmed subset was provided for training. We challenged 
scientists to run their classification algorithms and report a type and photo- z for each SN. Participants 
from 10 groups contributed 13 entries for the sample that included a host-galaxy photo- z for each 
SN and 9 entries for the sample that had no redshift information. Several different classification 
strategies resulted in similar performance, and for all entries the performance was significantly better 
for the training subset than for the unconfirmed sample. For the spectroscopically unconfirmed subset, 
the entry with the highest average figure of merit for classifying SNe la has an efficiency of 0.96 
and an SN la purity of 0.79. As a public resource for the future development of photometric SN 
classification and photo- z estimators, we have released updated simulations with improvements based 
on our experience from the SNPhotCC, added samples corresponding to the Large Synoptic Survey 
Telescope (LSST) and the SDSS-II, and provided the answer keys so that developers can evaluate 
their own analysis. 

Subject headings: supernova light curve fitting and classification 
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1. MOTIVATION 

To explore the expansion history of the universe, in- 
creasingly large samples of high-quality SN la light 
curves are being used to measure luminosity distances 
as a function of redshift. With rapidly increasing sam- 
ple sizes, there are not nearly enough resources to spec- 
troscopically confirm each SN. Currently, the largest 
samples are f rom t he Supernova Legacy Survey (SNLS: 
lAstier et all (120061)') and the S lo an Digital Sky Surve y- 
II (SDSS-II: lYork et al.l (12000) : iFrieman et atl (l200l h 
each with more than 1000 SNe la, yet less than half of 
their SNe are spectroscopically confirmed. The num- 
bers of SNe are expected to increase dramatically in 
the coming d ecade: thousands for the Dark Energy 
Survey (DES: iBernstein et al.l (|2009[ )) and a few hun- 
dred thousand for the Panoramic Surv ey Telescope and 
Rapid Response System (Pan- STAR RS Q and the Larg e 
Synoptic Survey Telescope ( LSST : llvezic et all (|2008l ): 
ILSST Science Collaboration! (l2009f )). Since only a small 
fraction of these SNe will be spectroscopically confirmed, 
photometric identification is crucial to fully exploit these 
large samples. 

In the discovery phase of accelerated cosmological ex- 
pansion, results were based on tens of high-redshift 
SNe la, and some samples included a significant frac- 
tion o f events that we r e not classified from a spec- 
trum dRiess et al l 119981 120041 : iPerlmutter et al.l 119971 : 
iTonrv et al.l I2003T ). While human judgment played a 
significant role in classifying these "photometric" SNe, 
more formal methods of photometric cl assification have 
been develo p ed over the past d e cade: IPoznanski et al 



20021 12001: IPahTen fe Goobarl (I2001: IGal-Yam et al 



2001: iSullivan et al.1 (120061) : I Johnson fc Crottsl 



,2006); 

Kuznetsova fc Connollvl (120071 ): IKunz et al.l (127)071) : 
Rodney fc Tonrvl (|2009l ). Some of these tech- 
niques have been used to select candidates for 
spectroscopic observations and rate measurements 
(IBarris fc Tonrvl 120061: INeill et al.|[2006t IPoznanski et al.l 
120071 iKuznetsova et al.l l2008t iDildav et al.1 120081 ) . but 
these methods have not been used to select a signifi- 
cant photometric SN la sample for a Hubblc-diagram 
analysis. In short, cosmological parameter estimates 
from the much larger recent surveys are ba sed solely on 
spectr oscopically con firmed SNe la (SNL S : lAstier et"aLl 
(12001) . ESSENCE: I Wood-Vasev et all (120071). CS P: 
IFreedman etHI (|2009l ). SDSS-II: IKessler et al.l (12009a!) ). 

The main reason for the current reliance on spectro- 
scopic identification is that vastly increased spectroscopic 
resources have been used in these more recent surveys. 
In spite of these increased resources, however, more than 
half of the discovered SNe lack spectroscopic observa- 
tions, and therefore photometric methods must be used 
to classify the majority of the SNe. There are two dif- 
ficulties limiting the application of photometric classi- 
fication. First is the lack of adequate non-la data for 
training algorithms. Many classification algorithms were 
developed using publicly available Nugent templates PI 
consisting of a single spectral energy distribution (SED) 
template for each non-la type. The Nugent templates 
were constructed from averaging and interpolating a lim- 



30 http: //pan-starrs . if a. hawaii . edu/public 

31 http: //supernova. lbl .gov/~nugent/nugent_templates .html 



ited amount of spe c troscopically confirmed non-la data 
(iLevan et all 120051: iHamuv et al.l [20021: iGilliland et al.l 
Il999t iBaron et al.l 120041: iCappellaro et al.l I1997D . and 

therefore the impact of the non-la diversity has not been 
well studied. The second difficulty is that there is no 
standard testing procedure, and therefore it is not clear 
which classification methods work best. 

To aid in the transition to using photometric SN clas- 
sification, we have released a public "SN Photomet- 
ric Classification Challenge," hereafter called SNPhotCC. 
The announcement of the challenge and instructions 
to participants were given in a challenge release note 
(jKessler et al.l I2010D . and an electronic mail message 
alert was sent to several dozen SN experts. The 
SNPhotCC consisted of a blinded mix of simulated SNe, 
with types (la, lb, Ic, II) selected in proportion to their 
expected rate. From 2010 January 29 through June 1, 
the public challenge was open for scientists to run their 
classification algorithms and report a type for each SN. A 
spectroscopically confirmed subset was provided so that 
algorithms could be tuned with a realistic training set. 
The goals of this challenge were to (1) learn the relative 
strengths and weaknesses of the different classification 
algorithms, (2) improve the algorithms, (3) understand 
what spectroscopically confirmed subsets are needed to 
properly train these algorithms, and (4) improve the sim- 
ulations. 

To address the paucity of non-la data, the CSP, SNLS 
and SDSS-II generously contributed unpublished spectro- 
scopically confirmed non-la light curves. These data are 
high-quality multiband light curves, and we are grateful 
to the donating collaborations. Since these non-la SNe 
are from surveys focused mainly on collecting type la 
SNe, this sample is brighter than the true non-la popu- 
lation. In spite of this bias toward brighter non-la, we 
anticipated that this challenge would be a useful step 
away from the overly simplistic studies that have relied 
on a handful of non-la templates. 

The outline of this article is as follows. In Sj2]we present 
full details of the simulation, including strengths, weak- 
nesses and bugs found during the SNPhotCC. In SJ3] we 
describe the classification methods used by the 10 par- 
ticipating groups. The figure of merit used for evaluation 
is defined in 21 and the results for all of the SNPhotCC 
participants are presented in 53 Updated simulations 
are described in and we conclude in SJT] 

2. THE SIMULATION 

Here we present full details of how the simulated sam- 
les were generated u sing the SNANA software packagtPl 
Kessler et al.l l2009bD . Both the strengths and weak- 
nesses are discussed to motivate improvements in future 
simulations. The limited information available to partic- 
ipants durin g the challenge is giv en in § 2 of the challenge 
release note (jKessler et aLll2010|) . 

2.1. Simulation Overview 

The simulation was realized in the griz filters of the 
Dark Energy Survey (DES), and distances were calcu- 
lated assuming a standard ACDM cosmology with J7m = 
0.3, Q\ = 0.7 and w = —1. The sky-noise, point-spread 
function and atmospheric transparency were evaluated 

32 http : //www . sdss . org/supernova/SNANA . html 
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in each filter and each epoch using a year long history 
of actual conditions from the ESSENCE project at the 
Cerro Tololo Inter- American Observatory (CTIO)0 For 
the five SN fields selected for the DES (3 deg 2 per field), 
the cadence was based on allocating 10% of the DES pho- 
tometric observing time and most of the nonphotomctric 
time. The cadence used in this publicly available simu- 
lation was generated by the Supernova Working Group 
within the DES collaboration^ Since the DES plans to 
collect data during five months of the year, incomplete 
light curves from temporal edge effects are included; i.e., 
the simulated explosion times extend well before the start 
of each survey season, and extend well beyond the end 
of the season. 

The SNPhotCC included a sample with a host- 
galaxy photometric redshift (SNPhotCC/HOSTZ) 
and another sample with no redshift information 
(SNPhotCC/noHOSTZ). For the former, the photo- z 
estimates were based on simul ated galaxies (for DE S) 
analyzed with the methods in lOvaizu et al.l (|2008al Tbl) . 
The average host-galaxy photo- z resolution is 0.03, and 
the photo- z distribution includes non-Gaussian outliers. 
A challenge with precise spectroscopic redshifts was 
not given because using accurate redshifts makes little 
difference on the classifications compared with using a 
host-galaxy photo- z. 

Two simple selection criteria were applied. First, each 
object must have an observation in two or more pass- 
bands with a signal-to-noise ratio (S/N) above 5. Second, 
there must be at least five observations after explosion, 
and there is no S/N requirement on these observations. 
These requirements are relatively loose because part of 
the challenge was to determine the optimal selection cri- 
teria. For the five seasons planned for the DES, the total 
number of generated SNe for all types was 1.01 x 10 5 . 
The number satisfying the loose selection requirements 
and included in the SNPhotCC was 1.8 x 10 4 . 

2.2. Type la Model 

Simul ated SNe la were based on an equal mix of the 
salt - ii fcuv et al. | |2007f) and mlcs models (|Jha et al.l 
[2007t iKessTer et al.l l2009ah . Since these two models do 
not agree in the ultraviolet region, we used a special 
MLCS-U2 version in which the ultraviolet region was ad- 
justed to match that of the salt-ii model. The treatment 
of color variations corresponding to each model was used. 
For MLCS-U2, extinction by dust resulted in reddened 
SNe la. The dust parameter Ry was drawn from an 
asymmetric Gaussian distribution peaked at Ry = 2.0 
with sigmas of 0.2 and 0.5 for the low and high side, 
respectively, and the Ry values were constrained to lie 
between 1.5 and 4.1; this Ry distribution has a mean 
value of 2.2. For salt-ii, the color-magnitude adjust- 
ment was given by /3c where /3 = 2.7 and c is the color 
excess, E(B — V). The c parameter was drawn from a 
Gaussian distribution with a c = 0.1 and the constraint 
\c\ < 0.4. 

33 The CTIO history of observing conditions is available in the 
public SNANA package (previous footnote). 

34 Although two of us (RK & SK) are members of the DES, 
we did not include other DES colleagues in any discussions about 
preparing the challenge, and we made our best efforts to prevent 
our DES collaborators from obtaining additional information be- 
yond that contained in the release note. 



In addition to the model parameters, we have simu- 
lated the anomalous Hubble scatter with random color 
variations. For each passband /, a random shift was 
drawn from a Gaussian distribution with a m = 0.09 mag, 
and this magnitude shift was applied coherently to all 
epochs within the passband. The scatter in each color 
was therefore 0.09 • y/2 mag. 

For the SNPhotCC/noHOSTZ it is important to include 
photometric passbands that correspond to rest-frame 
wavelengths outside the nominally defined ranges of the 
SN la models: specifically, the g and r bands at higher 
redshifts that probe the far ultraviolet region. Without 
an estimate of the redshift, analysis programs cannot ini- 
tially select observations that correspond to a particular 
rest-frame wavelength range. Since the spectral surfaces 
of the SN la models are defined over a much larger range 
than that where the models are defined, it is straightfor- 
ward to extend the wavelength range in the simulation. 
For both models, the lower wavelength rangj^l was re- 
duced to 2500 A. To simulate redder passbands for SALT- 
II, the upper range was extended from 7000 A to 8700 A. 

2.3. Non-la SN Model 

Simulated photometry of non-la SNe was based on 
spectroscopically confirmed non-la type light curves 
from the CSP, SNLS, and SDSS-II SN surveys. The 
basic strategy is to smoothly warp a standard SED to 
match the observed photometry and then use the warped 
SEDs to simulate SNe at all redshifts. After correcting 
the light curves for Galactic extinction, the light curve 
for each passband was smoothed using a general func- 
tion based on tha t used in the non-la rate analysis in 
iBazin et al.l (|2009l) . 

e -(t-*o)/T £all 

f(t) = A [l+ ai (t-t )+a 2 (t-t )} i + e _ (t _ to)/TriBe • (1) 

The parameters Aq, to, T r i sc , Tf a n and ai,2 are fit sep- 
arately for each passband. The polynomial parameters 
oi,a were initially fixed to zero; in cases where the fit 
was inadequate as determined by visual inspection, the 
fit was redone with the additional 01,2 parameters. Ex- 
amples of smoothed light curves, also called non-la tem- 
plates, are shown in Fig. [T] for the non-la SNe that 
were most commonly misidentified as an SN la during 
the SNPhotCC (fJSJ). To use a non-la template in the 
SNPhotCC, the corresponding light curve was required to 
have good sampling in all passbands, and this require- 
ment was based on visual examination rather than rigor- 
ous cuts. Among the 86 spectroscopically confirmed non- 
la from the SDSS-II, 34 were selected for the SNPhotCC; 
for the CSP, 5 of 6 were selected, and for the SNLS, 2 of 
9 were selected. A list of the 41 non-la SNe used in the 
SNPhotCC is shown in Table [1] combining the surveys, 
the numbers of types Ibc, II-P and Iln are 16, 23, and 2, 
respectively (also see Tabled]). 

While the general fitting function (Eq. [1]) appears ade- 
quate upon visual inspection, we note that the rise-time 
parametrization is not always accurate. For SN 14475 
in Fig. [TJ the rise time is well sampled and hence the 
smoothed template is reliable in this region of the light 

35 The default rest-frame wavelength ranges for MLCS2k2 and 
SALT-n are 3200-9500 A and 2900-7000 A, respectively. 
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curve. For CSP-2006ep, however, the u-band rise time is 
not well sampled and therefore the smoothed rise time is 
dependent on the particular paramctrization. Ideally the 
rise-time shape from well-measured non-la light curves 
would be used as an additional constraint in the smooth- 
ing function, but such constraints were not used in this 
SNPhotCC. 

The next step is to create a rest-frame time series 
of SEDs such that the redshifted synthetic magnitudes 
match those of the smoothed light-curve template at each 
epoch. These spectral time sequences are called "non-la 
template SEDs." The starting SED for each non-la sub- 
type is taken from the Nugent template, and it is then 
warped at each epoch to match the observer-frame pho- 
tometry. For a simulated non-la type and redshift, the 
corresponding non-la template SED is used to compute 
observer-frame griz magnitudes. 

In addition to the 41 non-la template SEDs we have 
also included four Nugent SED templates, each repre- 
senting a composite average over one of the subtypes 
shown in Table [2] The magnitudes were drawn from 
Gauss ian distributions as described in IRichardson et al.l 

(pool . 

The final step is to apply random color variations in the 
same manor as for the type la SNe. While the anomalous 
scatter in the SN la Hubble diagram motivates this step 
in the SN la simulation, the motivation for the non- 
la simulation is to describe a potentially broader class 
of objects. In the limit of a large and complete set of 
non-la templates there would be no need to simulate 
additional sources of magnitude variation. We have made 
the assumption, however, that our set of 41 templates is 
not large enough to describe the non-la population. 

2.4. SN Rates and Template Weights 

Following iDildav et al.l (|2008f ). the SN la volumetric 
rate (ry) was parametrized as r v = a(l + z) 13 with 
ai a = 2.6 x 10" 5 Mpc~ 3 /4 year-\ /3 Ia = 1.5, and 
h 7a = iJ o /(70kms _1 Mpc -1 ) where H is the present 
value of the Hubble parameter. Integrating out to a red- 
shift of z = 1.1, the total number of generated SN la 
for the DES survey is ~ 8000, and the number written 
for the SNPhotCC (i.e., passing the loose cuts in $2.1]i is 
- 5300. 

For the non-la rate, we assumed that the redshift de- 
pendence has the same general form as for the SNe la. 
The exponent term /3 non ia = 3.6 was taken to match that 
of the st ar formation r ate. T o estimate a n onia we use the 
result of IBazin et al.l (|2009f ) which reports an observed 
non-Ia/Ia rate ratio of 4.5 ± 1.0 for z < 0.4. We then cal- 
culate a non ia = 6.8 x 10 -5 such that the non-Ia/Ia rate 
ratio matches the observed ratio. Since the non-la rate 
has a much larger uncertainty at redshifts above 0.4, and 
to increase the sample of misclassified non-la, the non- 
la rate was arbitrarily increased by a factor of 1.3 at all 
redshifts. Integrating out to a redshift of 2 = 1.1, the 
total number of generated non-la for the DES survey is 
~ 9.3 x 10 4 , and the number written out for the SNPhotCC 
is - 1.3 x 10 4 . 

The generated non-Ia/Ia ratio over all redshifts is 12. 
After applying the loose selection requirements for the 
SNPhotCC sample f fl2~Tj) . this ratio drops to 2.4. We 
have likely overestimated the non-la contribution, but 



TABLE 1 

Spectroscopically Confirmed Non-Ia SNe Used for 
Templates 





Spec 


O bserved 


SNPhotCC 


in 


SN id 


type 


redshift 


index a 


D10 b 


CSP 2004fe 


lc 


0.0179 


05 




Lo.r 2UU4gq 


1 


O.OO00 


06 






lc 


0.0199 


07 




CSP 2006cp 


lc 


0.0495 


08 




CSP 2007Y 


lc 


0.0046 


09 




SNLS 04Dlla 


Ibc 


0.3190 


10 




SNLS 04D4jv 


lc 


0.2285 


11 




SDSS 2004hx 


II-P 


0.0375 


12 




SDSS 2004ib 


lb 


0.0555 


13 




SDSS 2005hm 


lb 


0.0339 


11 




SDSS 2005gi 


II-P 


0.0494 


15 


/ 


SDSS 004012^ 


lc 


0.0246 


16 




SDSS 2006cz~ 


II11 


0.0876 


17 




SDSS 2006fo 


lc 


0.0199 


18 




SDSS 2006gq 


II-P 


0.0688 


19 


/ 


SDSS 2006ix 


Iln 


0.0745 


20 




SDSS 2006kn 


II-P 


0.1193 


21 


/ 


SDSS 014475^ 


lc 


0.1425 


22 




SDSS 2006jo~ 


lb 


0.0757 


23 




SDSS 2006jl 


II-P 


0.0546 


24 


/ 


SDSS 2006iw 


II-P 


0.0295 


25 


/ 


SDSS 2006kv 


II-P 


0.0608 


26 


/ 


SDSS 2006ns 


II-P 


0.1189 


27 


/ 


SDSS 20061c 


lc 


0.0150 


28 




SDSS 2007ms 


lc 


0.0384 


29 




SDSS 2007iz 


II-P 


0.2525 


30 




SDSS 2007nr 


II-P 


0.1433 


31 


/ 


SDSS 2007kw 


II-P 


0.0672 


32 


/ 


SDSS 2007ky 


II-P 


0.0725 


33 


/ 


SDSS 20071j 


II-P 


0.0489 


31 


/ 


SDSS 20071b 


II-P 


0.0326 


35 


/ 


SDSS 200711 


II-P 


0.0801 


36 




SDSS 2007nw 


II-P 


0.0562 


37 


/ 


SDSS 20071d 


II-P 


0.0260 


38 


/ 


SDSS 2007md 


II-P 


0.0535 


39 


/ 


SDSS 20071z 


II-P 


0.0928 


10 


/ 


SDSS 20071x 


II-P 


0.0556 


11 


/ 


SDSS 2007og 


II-P 


0.1995 


42 




SDSS 2007ny 


II-P 


0.1452 


13 


/ 


SDSS 2007nv 


II-P 


0.1427 


44 


/ 


SDSS 2007nc 


lb 


0.0856 


45 





a Non-Ia index used in the SNPhotCC. 

b / means the II-P l ight curve has been publicly available in 
ID'Andrea eTTaTl (120101 1 since 2010 Jan 1. 
internal SDSS index. 

this overestimate was intentional in order to increase the 
statistics of non-la SNe that are misidentified as SN la. 

The breakdown of the non -la into subtypes (I bc, II- 
P,II-L, and Iln) is taken from lSmartt et all pOOl ). and 
the subtype fractions are shown in Table [5] along with 
the number of templates used to represent each subtype. 
Within a subtype class, each non-la template is given 
equal weight in the generation of simulated samples. For 
each subtype a composite Nugent template is included, 
and is given the same generation weight as each template 
based on an observed light curve. 

2.5. Spectroscopic Subset 

To allow participants to train their classification al- 
gorithms, a spectroscopically confirmed training subset 
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Fig. 1. — Spectroscopically confirmed non— la SNe data (black dots) resulting in the most misidentified non-la in the SNPhotCC. The 
smoothing function in Eq.[T]is shown by the green curve. The SN name and redshift are listed above each set of light curves. The filter is 
labeled in each panel. 



TABLE 2 

Non-Ia Subtype Fractions and Template Statistics 







No. of 


No. of 


Non-la 




measured 


composite 


subtype 


Fraction 


templates 


templates 


Ibc 


0.29 


16 


1 


II-P 


0.59 


23 


1 


II-L 


0.08 





1 


Iln 


0.04 


2 


1 



was provided. This subset was based on observations 
from a 4 m class telescope with a limiting r-band mag- 
nitude of 21.5, and on observations from an 8 m class 
telescope with a limiting i-band magnitude of 23.5. Us- 
ing this spectroscopic selection resulted in a subset of 
1256 objects, or about 7% of the total number of objects 
in the SNPhotCC. This training sample is not a random 
subset and is, in fact, a highly biased subset, as shown in 
Fig. [2j the true SN la fraction for the confirmed SNe is 
70%, compared with only 26% for the unconfirmed SNe. 
While a truly random subset would be ideal for training 



6 



Spec-confirmed 10% of Unconfirmed 




0.25 0.5 0.75 1 0.25 0.5 0.75 1 

redshift redshift 



Fig. 2. — For SNe with r > 21.5 at peak brightness, peak i- 
band magnitude vs. redshift for the spcctroscopically confirmed 
subset (left) and for 10% of the unconfirmed sample (right). The 
SNe la are shown by filled black circles; the non— la SNe by open 
red squares. The dashed grid lines are shown to guide the eye. 

classification algorithms, limited spectroscopic resources 
in future surveys are much more likely to obtain a biased 
sample unless there is sufficient motivation to modify the 
spectroscopic targeting strategy 

If each SN spectrum were taken exactly at the epoch 
of peak brightness (to), then the efficiency for obtain- 
ing a spectrum adequate for classification would depend 
only on the peak magnitude. However, a spectrum is 
typically taken slightly before or after to, when the SN 
is slightly dimmer than at peak brightness; therefore we 
have parametrized the efficiency for obtaining a spectrum 
(e sp oc) to be 

e sp oc = eo(l - x ) , x= — — , (2) 

where the parameters £, M m ; n and my lm are given in Ta- 
ble [3] for the r and i filters, and m poa k is the SN magni- 
tude at to. The coefficient eo = 0.4 for type la and 0.3 for 
non-la; this difference in the eo values was due to an error 
in the simulation f £|2.6p . The efficiency function is nearly 
flat for bright SNe and then decreases rapidly to zero at 
the limiting magnitude. A simulated SN is spectroscop- 
ically identified if 21.5 < m pcak < 23.5 and a randomly 

generated number (0-1) is less than e| pec , or if the anal- 
ogous criterion is satisfied for the r band. Since the e spec 
parametrization is an educated guess, future simulations 
should use a more refined parametrization based on the 
range of epochs in which spectroscopic observations are 
expected to be obtained. 



2.6. Bugs 

Here, we begin with the bugs that were identified and 
fixed before the SNPhotCC deadline for submissions; we 
then report bugs that were present during the SNPhotCC 
and fixed after the submission deadline. The identifi- 



TABLE 3 

Efficiency Parameters for Spectroscopic Observations 



Filter 


£ 


M min 




r 


5 


16.0 


21.5 


i 


6 


21.5 


23.5 



cation of bugs by the participants resulted in three up- 
dates during the SNPhotCC. For each update, only a small 
(~ 1%) fraction of the sample was modified, although the 
last update resulted in a 10% reduction in the sample 
size. A summary of bugs is shown in Table 21 

The first bug resulted in a small fraction of the SNe la 
having late-time fluxes that were much larger than the 
flux at the nominal epoch of peak brightness. This bug 
was induced by a poorly constrained quadratic term for 
the shape parameter correction in the MLCS-U2 modclF^I 
and it only affected fast-declining SNe la at epochs well 
past peak brightness. This artifact was removed by in- 
troducing a damping function for the quadratic term. 

The second bug resulted in a small fraction of the non- 
la SNe being much brighter than the SNe la. This bug 
was caused by using an untruncated Gaussian distribu- 
tion to select random magnitudes for the small fraction 
of non-la based on the Nugent SED templates. This bug 
was fixed by requiring the random numbers to lie within 
±2cr of the mean. 

The next issue involved an ambiguous redshift for 
SDSS SN 2004hx. The original redshift used to make 
the SED template was based on the host galaxy (zhost = 
0.0382) and led to an exceptionally bright type II SN. 
However, the preliminary redshift from the SN spectrum 
is zsn — 0.014, suggesting a type II SN with normal 
brightness. During the SNPhotCC we changed this SED 
template to use the normal SN brightness and left the 
redshift ambiguity to be resolved in a future analysis. 

The remaining four bugs described below were not cor- 
rected until after the SNPhotCC. The first unfixed bug is 
related to the rest-frame wavelength ranges covered by 
the SN models. While the non-la models arc defined for 
all rest-frame wavelength ranges, the valid wavelength 
range of both SN la models was restricted to be above 
2500 A. This wavelength restriction resulted in undefined 
g-band model magnitudes for SNe la at z > 0.8. To warn 
users about observations with undefined model magni- 
tudes, the SNANA simulation treats undefined model val- 
ues by writing the flux as — 9 ± — 9. This feature of the 
simulation was not noticed during the preparation of the 
SNPhotCC, and therefore high-redshift SNe can be iden- 
tified by simply inspecting the g-band flux value. For 
SNPhotCC participants who included these invalid g-band 
fluxes as if they were valid measurements, the absolute 
value of the uncertainty is a few times larger than the 
sky noise. Therefore, by accidental good luck this in- 
valid value is consistent with the correct value based on 
the sky noise and the very small SN la flux expected in 
the far-ultraviolet region. 

There were significantly more SN la generated by the 
SALT-n model than by the MLCS-U2 model. The primary 
reason is that we mistakenly used symmetric color and 
stretch distributions for SALT-n, while using the mea- 

36 See the Q parameter in lJha et alj 120071 ). 
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TABLE 4 

Summary of Bugs in the SNPhotCC Simulation. 



Date of 




bug fix 


Description of bug 


Mar 14, 2010 


Enormous fluxes for late-time 




(fast-declining) SNe la 




generated with MLCS-U2 


Mar 24, 2010 


Extremely bright non— la from 




untruncated Gaussian smearing 




in Nugent template mags 


Apr 13, 2010 


Ambiguous rcdshift for 2004hx 


After SNPhotCC 


g-flux and error are —9 for 




SNe la with z > 0.8 


After SNPhotCC 


Average SALT-II SN la is 0.2 mag too 




bright due to missing tails 


After SNPhotCC 


Each non-la SED template is 




too dim by a factor of 1 + z i, a 


After SNPhotCC 


no prc-cxplosion epochs 


After SNPhotCC 


Spectroscopic fractions were 




different for la and non-la 


Not fixed 


Trivial to cheat on entire 




SNPhotCC sample 



sured asymmetric distributions for MLCS-U2. The miss- 
ing non-Gaussian tails in the SALT-ii distributions re- 
sulted in an SN la sample that was ~ 0.2 mag too bright 
on average. This issue is discussed further in SJH 

This next bug is by far the most embarrassing. Each 
non-la SED template is too dim by a factor of 1 + z b s , 
where z bs is the observed redshift of the non-la SN used 
to construct the template; note that z h s is not the sim- 
ulated redshift. Thus for a non-la template constructed 
from an SN at z Q b s = 0.1, all simulated SN based on 
this template were 10% too dim. Figure Q] shows that 
some of the most commonly misidentified non-la light 
curves in the SNPhotCC were based on SDSS SNe with 
0.1 < z bs < 0.25, and therefore these simulated non-la 
SNe were 10-25% too dim. The combination of SNe la 
that are too bright (previous bug) and non-la SNe that 
are too dim may have made the photometric challenge 
somewhat easier for some methods. 

To improve analysis efficiency, the SNANA simulation 
was originally designed to exclude prc-cxplosion epochs. 
Although pre-cxplosion epochs should have been in- 
cluded in the SNPhotCC sample, we did not notice the 
missing epochs until one of the participants acknowl- 
edged using this feature to estimate the time of peak 
brightness. 

As described in ^2.51 the spectroscopically confirmed 
fraction was different for the SN types: for SN pass- 
ing the spectroscopic magnitude limits, the type la SNe 
were confirmed 33% more often than the non-la. The 
last known bug is that there is a trivial way to identify 
each SN type without any knowledge of SN science. Af- 
ter all submissions had been received, an "SN Cheater 
Challenge" was offered on 2010 June 2; it was solved 16 
hours later by Sako (see Table [SJ, but so far nobody else 
has solved it. 

3. TAKING THE SN CLASSIFIER CHALLENGE 

As described in $21 two independent challenges were 
generated: one with a host-galaxy photo-z for each SN 
and another without any redshift information. In addi- 



tion to these challenges based on the entire light curve, 
there was also an early-epoch challenge motivated by the 
need to prioritize SNe for spectroscopic follow-up obser- 
vations; this challenge was based on the first six photo- 
metric observations (in any filter) with S/N > 4. Par- 
ticipants attempted the full light-curve challenges with 
and without redshift information, but none of the partic- 
ipants attempted the early-epoch challenge, due to time 
limitations and the increased interest on the full light 
curve challenge that will eventually impact the cosmol- 
ogy analyses. 

The simulated l igh t curves are available at the 
SNPhotCC Web sitcFj Details on how to analyze the 
simulated sample are given in §3 of the SNPhotCC release 
note. To fully optimize classification algorithms during 
the challenge, several participants wanted to know the 
exact value of the false-tag weight ($4} used to deter- 
mine the figure of merit. On 2010 April 27 we therefore 
publicly announced that Wf* = 3; while this infor- 
mation clearly helped some participants optimize results 
for the confirmed subset, it is not clear if the information 
improved results for the unconfirmed sample. 

A total of 10 groups (or individuals) sent 22 submis- 
sions to be evaluated. Among the submissions, 13 are 
based on the SNPhotCC/HOSTZ, while the remaining 9 arc 
based on the SNPhotCC/noHDSTZ. Photo- z estimates were 
given by four participants in the SNPhotCC/HOSTZand by 
three participants in the SNPhotCC/noHOSTZ. 

Table [5] shows the list of groups and participants, indi- 
cates which challenge (s) were taken, and indicates if SN 
photo-z estimates were given. The average processing 
time is also given for each method, and these times vary 
from 1 s to > 200 s per SN using similar processors. A 
brief description for each method is given in Appendix IA1 

Among the participants, four general strategies were 
used to classify SNe. The first and simplest strategy 
was to fit each light curve to an SN la model and use 
the "duck test" philosophy: if it looks like a duck (i.e., 
an SN la) and quacks like a duck, then it is a duck. 
Selection cuts, mainly on the minimum % 2 , were used 
to determine which SNe are type la, and there was no 
attempt to classify a subtype for non-la. This strategy 
was used by Gonzalez, Portsmouth-^ 2 and SNANA cuts. 

The second strategy compares each light curve against 
both SN la and non-la templates, and uses the Bayesian 
probabilities to determine the most likely SN type. 
Poz2007 used the simplest Bayesian implementation, 
with a single la and non-la template. Belov & Glazov 
and Sako used SN la templates that depend on stretch 
and extinction, and they also used several non-la tem- 
plates. Sako included 8 non-la templates from the 
SDSS-II, although there was no coordination between 
his template development for classification and the de- 
velopment of templates for the SNPhotCC. Rodney used a 
variant of this technique by accounting for the fact that 
templates from observed SNe do not form a complete 
set. MGU+DU used another variation by using slopes 
(mag/day) at four different epochs and comparing with 
slopes expected for type la and non-la SNe. 

The third strategy used spectroscopically confirmed 
SNe la to parametrize a Hubble diagram, and then iden- 
tified SN la as those SNe that lie near the expected Hub- 

37 www.hep.anl.gov/SNchallenge 
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blc diagram. Portsmouth-Hub used a high-order polyno- 
mial to define the Hubble diagram while JEDI-Hub used 
the kernel density estimation technique. 

In the last strategy (InCA and JEDI-KDE) each light 
curve was fit with a parametric function such as a spline, 
and the fitted parameters were used for statistical infer- 
ences. Light-curve fitting parameters such as stretch and 
color were not explicitly used. 

4. EVALUATING THE SNPhotCC 

Ideally we would like to assign a single number, or fig- 
ure of merit (FoM), for each SNPhotCC submission. We 
begin the discussion by considering a measurement of the 
SN la rate based on photometric identification. After se- 
lection requirements have been applied, let iVj™ 6 be the 
number of correctly typed SNe la, and N( a lso be the num- 
ber of non-la that are incorrectly typed as an SN la. A 
simple classification FoM is the square of the S/N divided 
by the total number of SNe la (Af^° T ) before selection 
cuts, 



C 



1 



« m °) 2 



FoM -la : 



M TOT 
jytrue 

: £la X PPi a , 



|_ j^Afalsc^Yfalsc 



yytiuc 



"j^falsc j\rf alsc 



(3) 

where e u = N^ e /Af^ OT is the SN la efficiency that 
includes both selection and classification requirements, 
PPia, is the pseudopurity, and is the false-tag 

weight (penalty factor). Since A/"j^ OT is a constant that 
is independent of the analysis, we have divided out this 
term so that < CFoM-ia < 1, with CFoM-ia = 1 corre- 
sponding to the theoretically optimal analysis. 

When W{^ lsc = 1 , the denominator in PP\ a comes from 
the Poisson noise term in the S/N, and PP\ a can be in- 
terpreted as the traditional purity factor defined as the 
fraction of classified la that really are SNe la. In the ideal 
case where the mean of Ni a lse is perfectly determined!^! 
the naive Poisson uncertainty is the only contribution 
to the noise term and therefore W^ a lsc = 1. In prac- 
tice, however, uncertainties in determining the false-tag 
rate lead to W/^ 1130 > 1. For example, suppose that the 
estimate of N{ a lse is scaled from a spectroscopically con- 
firmed subset containing a fraction (e spec ) of the total 
number of SNe; in this case, the Poisson noise term is 
defined by setting W{f BC = 1 + e" 1 ^, and Wff sc > 1 if 
the spectroscopic subset is small. 

When using SN la for cosmological applications, it may 
be possible to reduce Wi a lsc using other methods to de- 
termine N( a lsc , such as fitting the tails in the distance- 
modulus residuals. A proper determination of W( a lse is 
beyond the scope of this classification challenge, and we 
have therefore arbitrarily set Wj f ^ l8C = 3. While this 
value is well below l/e spcc ~ 15 based on using the spec- 
troscopically confirmed subset, Wf a lsc is notably larger 
than unity and therefore penalizes incorrect classifica- 
tions more than rejected SNe. 



38 The mean value is the average over many independent 

measurements. 



5. RESULTS 

Here, we give a relatively brief overview of the main 
results and comparisons. Ideally, we would fully under- 
stand the strengths and weaknesses for each entry, but 
this level of detail is deferred to future analyses from in- 
dividual participants. Also, since the results presented 
here are simply a starting point for these studies, a de- 
tailed postchallcngc analysis could soon become obsolete 
as the algorithms are improved. Finally, the most im- 
portant goal here is not to identify the best method now, 
but to motivate improvements and then identify the best 
method appropriate to each SN survey. 

We begin by showing the non-la SNe that were 
misidentified as SNe la. For each challenge entry we have 
computed the fraction of false SN la tags corresponding 
to each non-la SED template: the sum of these fractions 
equals one for each entry. Fig. |3] shows the false-tag frac- 
tions averaged over all entries, and they are sorted from 
largest to smallest. For both challenges (with and with- 
out host-galaxy photo-z), the most frequently misidenti- 
fied non-la is based on SN 2006ep (SNPhotCC index = 8; 
see Table[2]), a spectroscopically confirmed SN Ic with a 
rest-frame <?-band peak magnitude of —19.1 mag. While 
the generated fraction for each Ibc SED template is 1.7% 
of the total, simulated non-la SNe based on 2006ep ac- 
count for ~ 20% of all misidentified SN la. The second 
most frequently misidentified non-la template, account- 
ing for 8% of all falsely tagged SN la, is based on SN 
2006ns (SNPhotCC index = 27), a spectroscopically con- 
firmed type II-P SN with a g-band peak magnitude of 
— 18.3 mag. 

The results from the SN la evaluations (Sj4j) are shown 
in Figures [4H5J corresponding to the challenges with and 
without host-galaxy photo-z information. As a func- 
tion of the true (generated) redshift, we have plotted 
the figure-of- merit quantity CFoM-ia (Eq. [3]), efficiency 
(ei a ), pseudopurity (PPi a ), and true purity. For each 
variable, the redshift dependence is shown separately for 
the spectroscopically confirmed subset (solid) and the 
unconfirmed SNe (dashed). The label on each panel in- 
dicates the name of the participant or group. The first 
panel labeled "All la tag" is an arbitrary reference in 
which every SN has been tagged as an SN la, thereby 
ensuring 100% efficiency. The corresponding results for 
type II classifications are shown in Fig. |6l 

For the SN la classifications, the most notable trend in 
all of the entries is that the figure of merit (CFoM-ia) is 
significantly worse for the unconfirmed sample than for 
the spectroscopically confirmed subset. Depending on 
the redshift, the confirmed-unconfirmed differences vary 
by tens of percent to nearly an order of magnitude. Sev- 
eral methods show improving CFoM-ia with redshift. We 
see this trend for the spectroscopically confirmed "All la" 
entry because at high redshift anything bright enough to 
obtain a spectrum is likely to be an SN la. 

For the unconfirmed SN subset, the largest CFoM-ia 
value in any redshift bin is about 0.6, but these entries 
show at least a factor-of-2 variation in CFoM-ia as a func- 
tion of redshift. The most stable figure of merit versus 
redshift (for unconfirmed SNe) has CFoM-ia = 0.3 - 0.45 
at all redshifts. The largest variation is 0.1 < CFoM-ia < 
0.6. 

In spite of the caveats about trying to determine the 
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TABLE 5 

List of Participants in the SNPhotCC. 







Classified 


SN 






Participants 


Abbreviation 51 






CPU C 


Description (strategy class£) 


P. Belov and S. Glazov 


Belov & Glazov 


yes / no 


no 


90 


light curve x 2 test against Nugent templates (2) 


S. Gonzalez 


Gonzalez 


yes / yes 


no 


120 


cuts on SiFTO fit x' 2 an d fit parameters (1) 


J. Richards, Homrighausen, 
C. Schafer, P. Freeman 


InCA^ 


no / yes 


no 


1 


Spline fit Sz nonlinear dimensionality 
reduction (4) 


J. Newling, M. Varuguese, 


JEDI-KDE 


yes / yes 


no 


10 


Kernel Density Evaluation with 21 params (4) 


B. Bassett, R. Hlozck, 


JEDI Boost 


yes / yes 


no 


10 


Boosted decision trees (4) 


D. Parkinson, M. Smith, 


JEDI-Hubble 


yes / no 


no 


10 


Hubble diagram KDE (3) 


H. Campbell, M. Hilton, 


JEDI Combo 


yes / no 


no 


10 


Boosted decision trees + Hubble KDE (3+4) 


H. Lampeitl, M. Kunz, 
P. Patel (JEDI groupj^) 












S. Philip, V. Bhatnagar, 


MGU+DU-P 


no/yes 


no 


< 1 


light curve slopes & Neural Network (2) 


A. Singhal, A. Rai, 


MGU+DU-2 - 


no / yes 


no 


< 1 


light curve slopes & Random Forests (2) 


A. Mahabal, K. Indulekha 












H. Campbell, B. Nichol, 


Portsmouth x 2 


yes / no 


no 


1 


SALT2-x^ & False Discovery Rate Statistic (1) 


H. Lampietl, M .Smith 


Portsmouth-Hubble 


yes / no 


no 


1 


Deviation from parametrized Hubble diagram (3) 


D. Poznanski 


Poz2007 RAW 
Poz2007 OPT 


yes / no 
yes / no 


yes 
yes 


2 
2 


SN Automated Bayesian Classifier (SN-ABC) (2) 
SN-ABC with cuts to optimize Cfom— la (2). 


S. Rodney 


Rodney 


yes/yes 


yes 


230 


SN Ontology with Fuzzy Templates (2) 


M. Sako 


Sako 


yes/yes 


yes 


120 


X 2 test against grid of Ia/II/Ibc templates (2) 


S. Kuhlmann, R. Kesslcr 


SNANA cuts 


yes /yes 


yes 


2 


Cut on MLCS fit probability, S/N & sampling (1) 



a Groups are listed alphabetically by abbreviation. 
b Classifications included for SNPhotCC/HOSTZ. 
c Classifications included for SNPhotCC/noHOSTZ. 
d photo-z estimates included. 

e Average processing time per SN (seconds) using similar 2-3 GHz cores. 

f From |j3l strategy classes are 1) selection cuts, 2) Bayesian probabilities, 3) Hubblc-diagram parametrization and 4) statistical inference. 
g Intcrnational Computational Astrophysics Group: http://www.lncagroup.org 
h Joint Exchange and Development Initiative: http://jedl.saao.ac.za 

i MGU=Mahatma Gandhi University, DU=Delhi University. 

best method in this first SNPhotCC, here we carefully ex- 
amine the CFoM-ia for the unconfirmed sample in the 
SNPhotCC/HOSTZ (Fig. 0}. The entry with the highest 
average figure of merit (Sako) has an average SN la 
efficiency of 0.96 and an average SN la purity (i.e., 
Wif sc = I) of 0.79. However, comparing the best figure 
of merit (vs. redshift) for each strategy shows that three 
strategies yield similar results: selection cuts, Bayesian 
probabilities and statistical inference. The remaining 
Hubble-diagram strategy is somewhat worse at low and 
high redshifts. Among the entries for a given strategy 
there is a large variation in the figure of merit, sug- 
gesting that the optimum has not been achieved. For 
participants who applied the same method to both the 
SNPhotCC/HOSTZ and the SNPhotCC/noHOSTZ, the aver- 
age CFoM-ia was smaller for the SNPhotCC/noHOSTZ by 
as little as 6% (Sako and JEDFKDE) and by as much as 
a factor of 2. 

The photo-z residuals are shown in Fig. [7] for those en- 
tries that include photo-z estimates. Here we show resid- 
uals only for true SNe la that have been correctly typed 
as an SN la. When the host-galaxy photo-z is available, 
the supernova light curve improves the photo-z precision 
for redshifts up to about 0.4. For the SNPhotCC/noHOSTZ, 
the bias and scatter of the residuals is significantly larger 
than for the SNPhotCC/HOSTZ. 

After evaluating the classification results and algo- 
rithms, two notable problems were identified in the im- 
plementations. First, the spectroscopically confirmed 



subset was gen erally treated as a random subset, which it 
clearly is not ( §2.5j) . The magnitude-limited selection of 
spectroscopic targets resulted in the selection of brighter 
objects in the training subset. In principle, the brighter 
objects in the training subset should be re-simulated at 
higher redshifts so that classification algorithms can be 
trained on more distant (dimmer) objects for which spec- 
tra cannot be obtained. 

The second general problem is that several entries did 
not use all available information from the light curves 
(most notably, ignoring colors), or effectively added noise 
to the information. The latter was mainly an artifact 
from a very poor determination of the epoch of maximum 
brightness. Specific details of these problems are given 
in Appendix |A"1 
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Fig. 3. — Among all generated non-la SNe that were falsely tagged as an SN la, the average fraction of each non-la template is shown. 
Error bars reflect the rms dispersion among the submissions. The SNPhotCC index shown above each data point is translated in TablcfS] 
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Fig. 4. — For each participant in the SNPhotCC/HOSTZ, results vs. redshift are shown for Cp M— iai ^spec, pseudopurity (PPi a ), and the 
true purity (PP Ia with W^ lse = 1). The first panel labeled "All la tag" is an arbitrary reference in which every SN has been tagged as an 
SN la, thereby ensuring 100% efficiency. The solid curves show ±l<7 s t a t values for the spectroscopically confirmed subset, and the dashed 
curves are for the unconfirmed SNe. Entries are arranged by method categories 1-4 (J3]l as indicated in parentheses under the participant 
names in the first panel. 
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Fig. 5. — For each participant in the SNPhotCC/noHOSTZ, results vs. redshift are shown for Cp M-iai ^spcc, pscudopurity (PPi a ), and the 
true purity (PP[ a with Wlf BC = 1). The first panel labeled "All la tag" is an arbitrary reference in which every SN has been tagged as an 
SN la, thereby ensuring 100% efficiency. The solid curves show ±l<r s t a t values for the spcctroscopically confirmed subset, and the dashed 
curves are for the unconfirmed SNe. Entries are arranged by method categories 1-4 (J3]l as indicated in parentheses under the participant 
names in the first panel. 
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The first panel labeled "All II tag" is an arbitrary reference in which every SN has been tagged as a type II, thereby ensuring 100% efficiency. 
The solid curves show ±l<r s t a t values for the spectroscopically confirmed subset, and the dashed curves are for the unconfirmed SNe. 
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Fig. 7. — Photo-z residuals (z phot — z gcn ) / (1 + z gcn ) vs. redshift for the SNPhotCC/HOSTZ (left) and SNPhotCC/noHOSTZ (right). The mean 
residual is shown the by solid curves, and the rms by the dashed curves. The dotted curves (same in each SNPhotCC/HOSTZ panel) show the 
rms for the host-galaxy photo-z. Note that the vertical scales arc different for the left and right plots. 
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6. UPDATED SIMULATIONS 

While we have no plans for another competition-style 
challenge, we have released updated simulated samples 
as a public resource for the development of p hotomet- 
ric SN classification and photo- z estimators™ For these 
updated samples we have fixed the known bugs ( fl2.6[) . 
made some improvements, provided additional samples 
corre sponding to the LSST (jLSST Science Collaboration! 
l2009h and SDSS-II surveys, and included the answer keys 
giving the generated type and other parameters for each 
SN. The answer keys will allow developers to study dif- 
ferent spectroscopically confirmed training subsets, and 
to evaluate their own analysis. 

The updated simulations have two main improvements 
related to the generation of SNe la. The first im- 
provement is a more realistic mode ling of col o r var i- 
ations based on recent results from IGuv et all (|2010l ). 
The newly measured variation is about 0.05 mag (Gaus- 
sian sigma) in the ultraviolet wavelength region and 
~ 0.02 mag in the other wavelength regions. These vari- 
ations are significantly smaller than what was used in 
the SNPhotCC, where an independent variation per pass- 
band was drawn randomly from a Gaussian distribution 
with a m = 0.09 mag. To obtain a reasonable Hubble 
scatter in the updated simulations, a 0.12 mag random 
Gaussian smearing is added coherently to all epochs and 
passbands. The second improvement is to use more re- 
alistic distributions of color and stretch (x\ parameter) 
for the SNe la generated with the SALT-ii model. These 
distributions include more realistic tails corresponding to 
dimmer SNe, resulting in fewer SALT-n-generated SNe la 
satisfying the loose selection criteria. The sample sizes 
generated from the MLCS and SALT-n models are thus 
very similar, in contrast to the larger SALT-II sample in 
the SNPhotCC f fl2~oD . 

7. CONCLUSION 

We have presented results from the SN classification 
challenge that finished 2010 June 1. Among the four 
basic strategies that were used in the SNPhotCC O, 
three strategies show comparable results for the entries 
with the highest figure of merit. Therefore no particular 
strategy was notably superior. For all of the entries, the 
classification performance was significantly better for the 
spectroscopic training subset than for the unconfirmed 
sample. The degraded performance on the unconfirmed 
sample was in part due to participants not accounting 
for the bias in the spectroscopic training sample. 

There is a large variation in the figure of merit and 
therefore we urge caution in using these evaluations to 
determine the best method. The quality of each imple- 
mentation varies significantly between participants (Ap- 
pendix [XJ and therefore some improvements are needed 
before drawing more clear conclusions. While this arti- 
cle signifies the end of the SNPhotCC, we consider this 
effort to be the start of a new era for developing classi- 
fication methods with significantly improved simulation 
tools. The results from this SNPhotCC may serve as a ref- 
erence to assess future progress from using improved al- 
gorithms and improved simulations. As described in £jGj 
these updated simulations, along with the answer keys 
giving the true type for each SN, are publicly available. 

39 http : //sdssdp62 . f nal . gov/sdsssn/SIMGEN_PUBLIC 



While the optimal classification algorithm can in prin- 
ciple be optimized after a survey has completed, it is 
advantageous to define the necessary spectroscopic train- 
ing sample before a survey has started. In particular, is 
a magnitude-limited training sample adequate (i.e., as 
used in this SNPhotCC), or is a less biased training sam- 
ple needed? The latter sample is clearly more desirable 
for training classification algorithms, but this strategy 
results in fewer spectroscopically confirmed SNe la. As 
described in this issue can be investigated more thor- 
oughly by defining arbitrary spectroscopic training sub- 
sets for the publicly available simulated samples. 

To optimize the use of a magnitude-limited sample, we 
suggest another strategy that was not tried by any of the 
participants. In principle the spectroscopically confirmed 
non-la sample can be used to simulate non-la SNe at 
higher redshifts to obtain an extended training sample 
for the classification algorithms. In contrast to an ideal 
unbiased spectroscopic sample however, this simulation 
strategy does not account for changes in the relative rates 
with redshift. 

The figure of merit used in this challenge (Q allows 
for a quantitative comparison between methods, but does 
not quantify the impact of photometric classification on 
the inference of cosmological parameters. Therefore, an 
important next step in using these simulations is to carry 
out a full analysis that includes the determination of cos- 
mological parameters from a Hubble diagram. 

We are grateful to the Carnegie Supernova Project 
(CSP), Sloan Digital Sky Survey-II (SDSS-II) and 
Supernova Legacy Survey collaborations for providing 
unpublished spectroscopically confirmed non-la light 
curves that are critical to this work. Funding for the 
creation and distribution of the SDSS and SDSS-II has 
been provided by the Alfred P. Sloan Foundation, the 
participating institutions, the National Science Founda- 
tion, the U.S. Department of Energy, the National Aero- 
nautics and Space Administration, the Japanese Mon- 
bukagakusho, the Max Planck Society, and the Higher 
Education Funding Council for England. The CSP has 
been supported by the National Science Foundation un- 
der grant AST-0306969. 
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APPENDIX 

CLASSIFICATION METHODS FROM SNPhotCC PARTICIPANTS 

Belov &l Glazov. — For each SN from the challenge, the public SNANA simulation was used to generate simulated 
SNe at the same epochs as the challenge SN. The epoch of peak brightness (to) was estimated to be 18 days after the 
first g-band measurement, thereby taking advantage of a bug in the SNPhotCC (Tabled]). However, this estimate of 
to did not account for the redshift or stretch. Types la, Ibc, II-P, Iln, and II-L were generated, and the non-la were 
based solely on the publicly available Nugent SED templates. The classification was then based on the minimum \ 2 
between the challenge SN and the SNANA-gcncratcd SNe. SNPhotCC S Ne with large minim um x 2 were rejected. 

Gonzalez. — SN la identification used the SIFTO light-curve fitter (|Conlev et a l.| [2008h that was devel oped by the 




SNLS. This fitting program was modified to include the redshift as a free parameter ( Sullivan et al. I [20061 ). The fitted 
values of the color, stretch and % 2 were used to determine if a candidate SN was a type la, but these values were not 
used to classify a non-Ta subtype. Type II-P identification was based on a postmaximum linear fit (in magnitudes per 
day) in each band. From the training sample, the resulting slope in each passband was used to define a probability 
space. 

InCA. — This method labeled supernovae by performing classification on a lower-dimensional representation of the 
SN light curves without relying on the use of templates or measured physical pa rameters such as stre tch and color. 
Specifically, the diffusion map approach to nonlinear dimensionality reduction ([Richards et al.l l2009f) was utilized. 
Using these lower-dimensional objects, well-established methods for classification were implemented to estimate the 
type for each unknown SN. 

The diffusion map was based on a pairwise distance measure over all of the observed light curves and bands. This 
distance matrix was then smoothed and transformed into diffusion space, providing the dimensionality reduction and 
possibly illuminating structure hidden in the original representation. 

To compute these distances, a regression spline was first fit to each SN light curve in each filter. This allowed each 
SN to be represented as fluxes (and errors) on 1-day intervals. The time axis was shifted so that the observer-frame 
time of peak r-band brightness was the same for each SN and the fluxes were normalized so that each SN has the 
same maximum r band flux. These steps were performed to ensure that the subsequent steps capture differences in 
the shapes and colors of each light curve. A potential weakness, however, was that using the observer-frame r-band 
as a reference does not match the peak colors and epochs in the rest frame. Using the normalized spline fit from each 
band of each light curve, the distance between SNe i and j in band b was defined as 

\ 1/2 

-^ e ] 2 /[«J 2 + (4e) 2 ]] ' ( Al ) 

where AXf, is the amount of overlap time (days) between the two SN light curves, F^ e is the spline-fitted flux of 
SN i in band b at epoch e, <j\ e is the fitted error, and the epoch index e runs over the overlapping time bins. The 
distance between each pair of SNe was constructed as the linear (not quadratic) sum of d\j over bands, dij = ~^2 b d\j. 
Next, the distance matrix dij was smoothed and transformed into an m-dimcnsional representation of each SN that 
best preserves the relationships between each pair of SNe in the context of a diffusion process over the data. This 
lower-dimensional representation was used (with m=50) in conjunction with the random forest classification method 
(Breiman 2001) to estimate the type of each SN based on the set of training SNe. 

JEDI KDE. — The light curve for each filter was fit to a modified r-distribution function with five parameters. 
The four filters and redshift resulted in 21 parameters. A Gaussian was constructed around each 21-parameter point 
with a variance related to the density of points in its vicinity. The sum of these Gaussians over the spectroscopic 
training subset constitutes the Kernel Density Estimator (KDE) . A relative probability of being a type la or non-la 
SN for any set of 21-parameters was obtained from the la and non-la KDEs. A selection cut on the KDE probabilities 
was used to make classifications. 

JEDI boost. — This method used a supervised learning algorithm for classifying high-dimensional, nonlinear data 
(|Hastie et al.l 12009!) . The idea was to combine decisions from a group of weak classifiers to make a more informed 
decision. This algorithm used the 21 parameters from the light-curve fit, plus the two KDE probabilities. The tree 
depth was 3, and the number of trees was 2000. 

JEDI-Hubble. — The spectroscopic training subset was used to construct a Hubble diagram and a two-dimensional 
KDE was constructed for the type la and non-la SNe. This method is similar to that of the Portsmouth-Hubble entry 
which used x 2 statistics instead of a KDE. 

JEDI combo. — This method combined the KDE probabilities from the JEDI-Boost and JEDI-Hubble methods. 

MGU+DU-1. — The spectroscopic training subset was used to estimate light-curve slopes (mag/day) in each filtcr 
in four separate observer-frame regions relative to the epoch of peak brightness: —25 to —1, 1 to 25, 20 to 75 and 
60 to 110 days. Redshift information was not used to translate these slopes into the rest frame, and each filter was 
treated independently so that color information was not used either. The slopes for each SN were then compa r ed wi th 
the expected slopes for each SN type using a "difference boosting neural network" (DBNN; iPhilip fc Joseph! (|2000ft ). 
If the same class was predicted in three or more filters, that class was used. In case of a tie, where two classes were 
each predicted by two filters, the product of the confidences was used to determine the class, with the one with the 
higher product winning. If there were no predictions, or if several classes were predicted by one filter each, the SN was 
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rejected. 

MGU+DU-2. — T his method was nearly the same as that for MGU+DU-1, except that a machine learning method 
called random forests (|Breimanll200lD was used to determine the predictive model. 

Portsmouth x 2 - — This classification was based on the r-band x 2 from SALT-ii light-curve fits (|Guv et all 12007ft . 
The xt cut was de termined by optimizing the CFoM-ia on the training sample using the false discovery rate statistic 
(jMiller et al.H200lft . The only selection requirement was that the SALT-II fit does not fail or return pathological values. 

Portsmouth-Hubble. — For the spectroscopically confirmed subset, a Hubble diagram (HD) was generated by the 
SALT-n light-curve fits. This HD was then fit to a fourth-order polynomial, resulting in an expected HD curve that has 
no assumptions about cosmological parameters. For the unconfirmed sample, a x 2 was computed for each SN based 
on the proximity of the distance modulus to the expected HD curve. The r-band x 2 from the previous entry was not 
used. 

Poz2007 RAW.— The SN automated Bayesian classifier (SN-ABC; iPoznanski et all (|2007ft ) was used without 
any modifications. The light-curve templates included one SN la (no stretch or color dependence) and the II-P SED 
template from Nugent. 

Poz2007 OPT. — SN-ABC was used as in the previous entry, and included selection cuts based on optimizing the 
figure of merit (SQJ for the spectroscopically confirmed subset. 

Rodney.— Th e method of "Supernova Ontology with Fuzzy Templates" (SOFT; iRodnev fc Tonrvl (|2009ft : 
iRodnev fc TonTvl (pOlOft ) was used with three significant adjustments. First, the spectroscopically confirmed sub- 
set was used to define a redshift-dependent probability for each class. Next, instead of fixing the extinction parameter 
Rv, it was allowed to take three discrete values: 1.3, 2.2, or 3.1. Finally, the host-galaxy photo- z was included as a 
prior for the SNPhotCC/HOSTZ. To reduce the processing time without dramatically affecting the results, the spectro- 
scopic training set from the challenge was used to reduce the SOFT template library from ~ 40 templates down to 
20. 

Sako. — This e ntry used an improved version of the method used to classify objects during the SDSS-II SN Survey 
(|Sako et al.ll2008ft . A x 2 was computed between the observed photometry and each SN from a large set of templates 
that included SN la and non-la light-curve models. For the SN la models there were 5 parameters defining a grid of 
45 million templates: 1) reds hift, 2) rest-fr ame y-band extinction, 3) time of maximum light in B band, 4) shape- 
luminosity parameter Amis (jPhillipsi 119931 ). and 5) distance modulus. Flat priors were assumed for all parameters 
except when the host-galaxy redshift was available. The non-la templates were based on spectroscopically confirmed 
SDSS-II SNe including type Ibc (2005hl t , 2005hm*, 2006fo*, and 2006jo*) and type II (2004hx*, 20051c t , 2005gi*, 
and 2006jl*). The star (dagger) superscript indicates that this SN was (was not) used in the SNPhotCC (see Tabled]). 
Although the choice and development of these templates were completely independent of the SNPhotCC, this method 
clearly had an advantage in using a few of the same templates that were used in the SNPhotCC. 

SNe with large x 2 were rejected. The final SN classification was based on the largest Bayesian pr obability among 
the ca lculated probabilities to be a type la, Ibc or II. This algorithm is similar to the one presented in IPoznanski et aLl 
(|2007ft except that non-la SNe were classified into subtypes Ibc and II using an extended set of templates, the distance 
modulus was allowed to vary (instead of being computed from the SN photo- z and an assumed cosmology) and the 
shape parameter was allowed to vary for SN la light curve models. 

SNANA Cuts. — Two of the challenge organizers (SK & RK) created a submission using the SNANA-MLCS light-curve 
fitter along with selection cuts that were guessed long before the SNPhotCC. We did not optimize the cuts, or use 
our in-depth knowledge of how the SNPhotCC was generated. The primary cut required that the MLCS light-curve fit 
probability be above 10%. The other selection requirements were 1) at least one measurement before the epoch of 
peak brightness and another 10 days later in the rest frame, 2) maximu m S/N> 10 an d 3) tw o additional filters with 
maximum S/N> 5. The photo-z estimates used the method described in lKessler et al.l (|2010f) . 
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